feat: BYOE 0.8.0 - endpoints registry + openai-compat provider (REQ-142)#92
Merged
feat: BYOE 0.8.0 - endpoints registry + openai-compat provider (REQ-142)#92
Conversation
Phase 1 of the Bring-Your-Own-Endpoint sprint. Adds a generic OpenAI-v1-compatible endpoint registry so users can register self-hosted vLLM, llama.cpp server, LM Studio, and TGI backends and pick between them. - src/specsmith/agent/endpoints.py: Endpoint / EndpointAuth / EndpointStore / EndpointHealth dataclasses, schema_version=1, JSON persistence at ~/.specsmith/endpoints.json (chmod 600), token resolution dispatch (none / bearer-inline / bearer-env / bearer-keyring), /v1/models health probe with TLS verify toggle. - src/specsmith/cli.py: 'specsmith endpoints' group with add / list / remove / default / test / models subcommands. Inline-token redaction in --json output, optional bearer-keyring storage with hidden-input prompt, --purge-keyring on remove, --set-default on add. - tests/test_endpoints_store.py + tests/test_endpoints_cli.py: 38 new tests covering validation, round-trip, redaction, token resolution dispatch, and /v1/models health against an in-process fake server. - tests/fixtures/api_surface.json: registered 'endpoints' as a top-level command for REQ-140 stability. - docs/site/endpoints.md: BYOE walkthrough, auth strategy table, security notes, CLI reference. Validation: ruff lint clean, ruff format clean, mypy strict clean for the new module, pytest 66/66 passing across the new suites + the existing api-surface stability test. Co-Authored-By: Oz <oz-agent@warp.dev>
Phase 2 of the Bring-Your-Own-Endpoint sprint. Wires the registry from PR-1 into the chat surface and the persistent serve loop. - src/specsmith/agent/chat_runner.py: new _run_openai_compat driver streams from a registered Endpoint via raw stdlib HTTP / SSE (no openai SDK dependency). run_chat() takes an optional endpoint_id; when set, the BYOE store is consulted and the resolved endpoint short-circuits the auto-detect provider chain. Failure modes (unreachable, 401, missing default model) fall back gracefully. - src/specsmith/cli.py: 'specsmith chat --endpoint <id>' threads through to run_chat. 'specsmith serve --endpoint <id>' resolves the endpoint at startup, derives provider+model, and exports SPECSMITH_ACTIVE_ENDPOINT for downstream consumers. - tests/test_chat_runner_openai_compat.py: 4 new pytest cases against an in-process fake /v1/chat/completions SSE server. Covers happy-path streaming, missing default-model fallback, 401-on-bad-token fallback, and the run_chat entry point with endpoint_id resolution. Validation: ruff lint + format clean, 82/82 passing across the new + existing endpoint and warp parity suites. Co-Authored-By: Oz <oz-agent@warp.dev>
Bump pyproject.toml to 0.8.0 to ship the Bring-Your-Own-Endpoint feature (REQ-142): the new endpoints store + 'specsmith endpoints' CLI group (PR-1) and the openai-compat provider driver wired through 'specsmith chat / serve --endpoint <id>' (PR-2). Co-Authored-By: Oz <oz-agent@warp.dev>
This was referenced May 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Ships Bring-Your-Own-Endpoint (BYOE) support for OpenAI-v1-compatible LLM backends (vLLM, llama.cpp
server, LM Studio, TGI, ...). Closes the user request to route Specsmith chat / serve through a self-hosted vLLM on the LAN.Phases
src/specsmith/agent/endpoints.py—Endpoint/EndpointAuth/EndpointStore/EndpointHealthdataclasses;schema_version=1; JSON persistence at~/.specsmith/endpoints.jsonwithchmod 600; token resolution dispatch (none/bearer-inline/bearer-env/bearer-keyring);/v1/modelshealth probe with TLS verify toggle.specsmith endpointsgroup withadd/list/remove/default/test/modelssubcommands. Inline-token redaction on--json, hidden-input prompt for keyring path,--purge-keyringon remove.docs/site/endpoints.mdwalkthrough +api_surface.jsonregistersendpoints.--endpointflag._run_openai_compatinchat_runner.pystreams from the registered endpoint via raw stdlib HTTP / SSE (no openai SDK dependency).run_chattakes an optionalendpoint_id; when set, the BYOE store is consulted and the resolved endpoint short-circuits the auto-detect provider chain. Failure modes (unreachable, 401, missing default model) fall back gracefully.--endpoint <id>flag onspecsmith chatandserve.serveresolves the endpoint at startup, derivesprovider+model, and exportsSPECSMITH_ACTIVE_ENDPOINT./v1/chat/completionsSSE server.Validation
ruff check+ruff format --checkclean for the new and modified files.mypyclean forsrc/specsmith/agent/endpoints.py(the strict-mode tier).pytest tests/test_endpoints_store.py tests/test_endpoints_cli.py tests/test_chat_runner_openai_compat.py tests/test_warp_parity_followup.py tests/test_warp_parity.py→ 82 passing.How to test on your workstation
specsmith0.8.0 (or run the dev branch in editable mode).specsmith endpoints add --id home-vllm --name "Home vLLM" --base-url http://10.0.0.4:8000/v1 --default-model qwen2.5-coder --auth none --set-default.specsmith endpoints test home-vllm.specsmith chat --endpoint home-vllm "hello"— the response now streams from your vLLM, not Ollama / Anthropic / OpenAI.Out of scope (PR-3 in the extension repo)
specsmith.endpoints/specsmith.testEndpointcommands. Those land separately as a sibling PR (feat(extension): BYOE 0.8.0 - endpoints commands + bridge --endpoint plumbing (REQ-142) specsmith-vscode#46).Co-Authored-By: Oz oz-agent@warp.dev